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ABSTRACT 

We propose a coercive approach to simultaneously register and seg¬ 
ment multi-modal images which share similar spatial structure. Reg¬ 
istration is done at the region level to facilitate data fusion while 
avoiding the need for interpolation. The algorithm performs alter¬ 
nating minimization of an objective function informed by statistical 
models for pixel values in different modalities. Hypothesis tests are 
developed to determine whether to refine segmentations by splitting 
regions. We demonstrate that our approach has significantly better 
performance than the state-of-the-art registration and segmentation 
methods on microscopy images. 

Index Terms — Image registration, Image segmentation, Multi¬ 
modality, Microscopy image. Hypothesis test 

1. INTRODUCTION 

This paper addresses the two problems of multi-modal image 
registration and image segmentation within a single framework. 
Multi-modal registration refers to registration of images acquired 
by different sensor/scanner types. It has been applied to many ar¬ 
eas, e.g. medical images, microscopy images, and radar images, to 
combine information from different modalities and provide more 
comprehensive understanding about the true object. Image segmen¬ 
tation, the partitioning of an image into meaningful regions, is an 
important step in image analysis and understanding. 

In this work, we focus on multi-modal registration and segmen¬ 
tation as applied to scanning electron microscope (SEM) images of 
materials; the methods to be discussed are equally applicable to other 
multi-modal images that share spatial structure. SEM techniques are 
widely used in materials science for material characterization, for 
example detection of defects that may cause fatigue when operating. 
Segmentation is of interest to map locations of grains, uniform re¬ 
gions occupied by continuous crystal lattices, since grain structure is 
a principal factor in determining the properties of a polycrystalline 
material such metallic or ceramic materials m Multi-modal reg¬ 
istration is desired because different scanning electron modalities 
carry complementary information Em. For example, Backscat- 
tered Electrons (BSE) provide information about topography and lo¬ 
cal fine-scale surface texture ID while Electron Backscatter Diffrac¬ 
tion (EBSD) measures crystal orientation which is useful in locating 
grains and grain boundaries (S). 

Multi-modal registration is made challenging by the fact that im¬ 
ages from different modalities may have different resolutions, val- 
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ues that lie in different spaces (e.g. scalars vs. vectors), and dif¬ 
ferent levels of distortion. In SEM for instance, these differences 
are due to different electron beam geometries, sensors, and record¬ 
ing electronics. Furthermore, there is often no complete forward 
model that jointly characterizes the multi-modal signals, nor a trans¬ 
formation model that adequately describes the distortion. In these 
circumstances, pixel-level registration methods isiEiia, i.e., those 
that establish correspondences between pixels, usually resort to in¬ 
terpolation, a somewhat ad-hoc solution that may bias the resulting 
images toward excessive smoothness. On the other hand, segmenta¬ 
tion of multi-modal images, if done independently for each modality 
using existing methods (Hiliisi, may fail due to low contrast in 
some modalities and face difficulties in identifying correspondences 
between segmented regions from different modalities. 

In this work, we propose a coercive region-level approach to 
simultaneously register and segment images of different modalities 
that share similar spatial structure. The algorithm is initialized by 
segmenting one image by a standard method and coarsely mapping 
the result onto the other image. Then the two images are registered 
at the region level and further segmented through alternating mini¬ 
mization of a statistically-based objective function. There are several 
advantages of our approach. First, the region-level approach is free 
of pixel value interpolation and its inherent assumptions. Second, it 
takes advantage of modalities with better discriminative power, im¬ 
proving the overall segmentation result. The approach also preserves 
region correspondences to facilitate data fusion in I11II12I . Lastly, 
both registration and refinement of segmentation are driven by sta¬ 
tistical models. In particular, we propose hypothesis tests to detect 
boundaries that are missed by the initial segmentation due to low 
contrast. 

The paper is organized as follows. In Section]^ we describe the 
objective function, statistical models for data from different modal¬ 
ities, and optimization methods for the region-level registration al¬ 
gorithm. In Section 1^ we focus on hypothesis testing for detect¬ 
ing missing boundaries. Section shows experimental results for 
synthetic and real materials images and compares several different 
approaches. Section|^concludes this paper. 

2. ALGORITHM FRAMEWORK AND MODELS 
2.1. Objective Function 

We assume that there are two images from different modalities. A 
pixel location {x,y) G Ti,i G {1,2} is represented as vector p, 
where Ti is the spatial domain for the i-th modality. The pixel value 
at p is given by a function /i(p). Note that the values /i(pi) and 
L(P 2 ) may lie in different spaces. The region-level registration 




problem is to find partitions ofXi, Si = where each seg¬ 

ment Ri- is a collection of connected pixel locations and Ni is the 
number of segments, to minimize the following objective function: 

U{Sl,S2) = J{Sl,Il) + JiS2,l2) + \D{Sl,S2), (1) 

where J{Si,Ii) is the intra-modal energy function that measures 
how well the segmentation fits the image data and D{Si, S2) is the 
inter-modal energy function that coerces the segmentation results to 
be topologically similar, motivated by the fact that they share the 
same underlying physical structure. The parameter A controls the 
relative importance of the two terms. 

In this paper, we define the inter-modal energy D{Si, S2) to be 
the number of segment boundaries that are present in one modal¬ 
ity but not the other. This number is easily tracked because our al¬ 
gorithm maintains tight correspondences between segments in the 
two images. More generally, segment structure can be represented 
by a connected adjacency graph and the inter-modal energy can be 
any function which measures the topological distance between two 
graphs. The intra-modal energy J{Si,Ii) is defined by the statistical 
models described in the following subsection. 


2.2. Statistical Models for Pixel Values 


In the materials context, each segment Rij corresponds to a grain, 
a continuous crystal lattice. Motivated by this, we assume that the 
observed values within a segment are similar and can be modeled by 
i.i.d. random variables following a distribution with the same param¬ 
eters. In the sequel, the image modality subscript i is suppressed for 
simplicity. Let the probability density function (PDF) of the distribu¬ 
tion for one modality be denoted by /(/(p)|q:), where a represents 
the parameters specifying the model. The intra-modal energy func¬ 
tion in given a set of segments S = {i?i, R 2 , ■■■, Rn} is defined 
as: 


j{s,i) = Yl 




■ ^ogf{I{p)\aj) + e f 

pGfl, 


dl 


( 2 ) 


where dRj is the boundary of region Rj with counter-clockwise 
definition and Aj is the maximum-likelihood (ML) estimate for the 
parameters of region Rj. The first term is the negative log-likelihood 
of observations which penalizes grain inhomogeneity and the second 
term penalizes the boundary length, where e controls the level of 
smoothness. 

In this paper, we consider the EBSD and BSE images of one 
section of a material as our input. Note that other image types can 
be used directly given properly defined statistical models. Eor BSE 
images, since the pixel values are scalars, the intensities in the same 
grain region are modeled by a univariate Gaussian ,o-j), where 
HjjUj are the mean and variance of Rj. Notice that fij and cr| are 
unknown parameters to be estimated from image data. 

For EBSD images, the pixel values characterize the local crys¬ 
tal orientation, which can be represented by Euler angles Oil, 
Rodrigues vectors m or quaternions fSl We choose the unit- 
quaternion representation, i.e. a q G S^, the 3-dimensional unit 
sphere in This allows use of the von Mises-Fisher (VMF) dis¬ 
tribution in directional statistics QSl, a natural generalization of the 
multivariate Gaussian distribution to the sphere C (here 

p = 4). However, the VMF distribution cannot be used directly due 
to crystal symmetry, which causes there to be more than one quater¬ 
nion representation corresponding to a single crystal orientation. 
The ambiguity in representation may lead to a large artificial diver¬ 
sity of pixels within the same grain, resulting in an over-segmented 


result. To cope with this problem, we have proposed a VMF mixture 
distribution model which accounts for symmetry in our previous 
work 03 . To briefly describe the model, let O be a group of quater¬ 
nion matrices {Qi,. .., Qm} which define the symmetry actions 
that map a quaternion q to its symmetric equivalents. The PDF 
of the pure VMF distribution is 0(x; /x, k) — Cp(k) exp(Kfj,^x), 
where x, /x G /x is the mean direction, k is the concentra¬ 
tion parameter, CpIk) — and Ir,(.) is the modified 

Bessel function of the first kind with order p. The density function 
of the VMF mixture distribution is then given by 


/(x; /X, Ac) = ^ 

772 = 1 

The parameters /x and k can be estimated from image data through 
the Expectation-Maximization algorithm derived in 03 . 

2.3. Optimization 

We minimize the objective function by alternately fixing 
and solving 0 and 0. 

= arg mm J(S',/a) -b \D[Sf\S) (4) 

= argmm -b AL>(S', (5) 

where k is the iteration index. Typically 2-3 iterations suffice. 

To initialize the algorithm, the initial segmentation of the first 
modality, , is obtained by using a suitable image segmentation 
method. For example, the Voronoi-based method in na can be ap¬ 
plied to EBSD images and the Stabilized Inverse Diffusion Equa¬ 
tion method to BSE images (9). Since EBSD data provides crystal 
orientation which defines grain regions more accurately, we choose 
to start with EBSD segmentation in this paper. Next, to account 
for global misalignment and any resolution difference between the 
modalities, we determine an affine transformation by treating the 
material sample as a binary image and registering its outer bound¬ 
ary from one modality to the other using the Elastix toolbox EH. 
The transformation is then used to map onto the other modality, 
yielding the initial segmentation S^'^. Note that due to localized dis¬ 
tortions between the modalities, the initial segmentation S^'’ may be 
misaligned with the image values as shown in Fig |l(a)| and therefore 
needs to be registered. 

Optimizing 0 and 0 is done in two steps. The first step is 
to consider splitting regions in the current segmentation by adding 
boundaries. In Section|^ we propose a hypothesis testing approach 
for this purpose based on the statistical model The second step 
is to register the misaligned boundaries. Due to the fact that adjust¬ 
ing boundary positions does not change the topology of the segment 
structure, the inter-modal energy function D{Si, S2) is not changed 
in this step, reducing 0 and 0 to the intra-modal energy function 
J {S, I) alone, which is given by our statistical model. We use the 
Region Competition algorithm 1201 to minimize J{S, 1 ). This algo¬ 
rithm applies gradient descent to move pixels comprising the bound¬ 
aries dRj along their respective normal directions. There are two 
forces driving the movement corresponding to the two terms in 
the statistics force which comes from the distribution model for the 
pixel values, and the smoothing force which drives the boundary to 
have smaller curvature. More details are given in I20l . 
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(a) Misalignment (b) Missing Edge 

Fig. 1: (a) shows a misaligned boundary while (b) indicates a miss¬ 
ing boundary. Notice that in both situations, pixel values within the 
regions outlined in red are multi-modally distributed. 

3. HYPOTHESIS TESTS EOR MISSING BOUNDARIES 

This section elaborates upon the first step in solving 0 and 0, 
namely hypothesis testing to determine whether a region R £ S 
should be split into two based on the observed image values. We re¬ 
fer to this as the missing boundary problem. Recall that R may come 
from the initial segmentation result from another modality and may 
not fit the current image data. Figure[T]shows examples of misalign¬ 
ment and a missing boundary. One can see that both of the situations 
have multi-modal distributions of pixel values within the initially de¬ 
fined regions but only Fig |l(b)| shows a missing edge that should be 
identified. Therefore, a region R is declared as having a missing 
boundary if and only if it satisfies the following two conditions: (1) 
The pixel values are multi-modally distributed. (2) The multi-modal 
distribution is unlikely to be caused by misalignment. We develop 
two hypothesis tests for the two criteria. The first hypothesis test uses 
the Generalized Likelihood Ratio Test (GLRT) 1211 to test whether 
the pixel values are multi-modally distributed. The second hypothe¬ 
sis test differentiates misalignment from a missing boundary. 

3.1. Hypothesis Test for Multi-Modality 

Recall from Section|2^that the set of pixel values within a region R 
are modeled by a distribution f{I{R)-, a) with unknown parameters 
a, where I{R) = {/(p)}pgij are the observed pixel values in R. 
Assume there exists a boundary t/i which partitions R into two sub- 
regions R+, R- with parameters «+, a-. The two hypotheses are 
Ho', region R is indeed a single region, i.e. a+ = and Hi: R 
consists of two regions. The GLRT has the following form: 


We use the Region Growing algorithm l20l to locate the optimal 
boundary ip. The algorithm partitions a region starting from two 
seed pixels and greedily adds neighboring pixels until all pixels in 
the region are chosen. 

3.1.1. Multi-modality Test for Univariate Gaussian Distribution 

The GLR A.glr\iIi given boundary ip for testing mean equality be¬ 
tween two Gaussian distributions has the following form f22\ : 

^GLR\yj = ( ^2 I ”+ -2 j ’ 

where (Tq , , a?, are the ML estimators of the variances under the 

null and alternative hypothesis and n,n+,n_ are the numbers of 
pixels in regions R, R+, i?_. The optimization of the boundary ip 
then takes the form 

Ip = argrnin ^ (/(p) - fi+f + ^ (/(p) - fi-f 

pGRj^ pGR- 

where fl± is the ML estimate of the mean in R±. 

3.1.2. Multi-modality Test for von Mises-Fisher Distribution 

The VMF mixture distribution is reduced to single VMF through 
transoforming the samples by the symmetry operator towards the 
mean direction estimated by the EM algorithm. According to the 
derivation of the ML estimators in II231 . Rglr\iI> has the following 
form: 

Acifliv. = exp(Ki(||r+|| -F |lr_||) - Ko||ro||) (9) 

Cp{Ror 

where r+ = Ep6fl+ ^(p),r- = Ep6H_ ■^(p),ro = r+ -F r_ 
andKi = A“^((||r+|| -F ||r_ ||)/n), kq = Af^{\\vo\\/n), Ap{x) = 
Ip/ 2 {x)/Ip/ 2 -i{x). The optimization over ip is 

Ip = argmaxn(logCp(/ti) -F kiAp{ki)) 
i’ 

= argmax|| ^ /(p)|| + || ^ /(p)|| 
pGR-\. pGR- 



log Acii? 


log max 

y, 


max{„+,a_} f{I{R);a+,OL-,ip) 
max{„^=„_} f{I{R)',a+,a-,ip) 


The last equality comes from the fact that n(log Cp{x) -F xAp{x)) 
and Ap(x) are monotonically increasing functions of x. 


max log/(7(p)|Q:+) + Y log/(^(p)l«-) „ th • T tf A/i' r i 

^ ^ 3.2. Hypothesis Test for Misalignment 


- 5Zlog/(7(p)|a) '^hI ^ 

pGR 

where 01 , 61 +, a- are the ML estimates of the parameters under the 
null and alternative hypotheses and A is the coefficient in 0. The 
GLRT can be viewed as a trade-off between the improvement in the 
intra-modal energy and the penalty of A paid in the inter-modal en¬ 
ergy when inserting a boundary. The boundary length penalty is 
neglected here for simplicity but can be included easily. 

In the following subsections, we derive the GLRT for univariate 
Gaussian and VMF distributions given the boundary ip. We only dis¬ 
cuss the equal variance (concentration parameter) case due to the pa¬ 
per length constraint. These expressions supply the objective func¬ 
tion, denoted as to be maximized with respect to ip in 0. 


For regions that pass the previous multi-modality test (Hi declared 
in 0), we perform a second hypothesis test to determine whether the 
multi-modal distribution is due to Hq: boundary misalignment, or 
Hi: a missing boundary. Since in most cases, misalignment causes 
only a small portion of pixels to differ from the majority, one naive 
test is to set a threshold on the ratio of the size of the smaller region 
to the whole region: 

( 11 ) 

However, since region size can vary over several orders of mag¬ 
nitude, the same absolute amount of misalignment (in pixels) can 
result in very different size ratios, making it hard to set a univer¬ 
sal threshold. Therefore, we propose an adaptive threshold which 
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Fig. 2: Misalignment caused by displacement for (a) a realistic re¬ 
gion shape; (b) a simplified circular model. 

incorporates region size. Boundary misalignment is modeled by 
a displacement in position (see Fig |2(a)| >, where the displacements 
{dx,dy) are bivariate Gaussian with zero mean and covariance 
Sd = ct^T -2 and I 2 is the 2x2 identity matrix. For simplicity, 
the reg ion is m odeled as a circle with radius r (Fig. |2(b)^ , where 
r = ^\R\/'k is the equivalent radius of region R. Based on these 
assumptions, the test statistic in jrg can be formulated as the fol¬ 
lowing function of d = given r: 

T = fr{d) = 1 - - ciiccos{^) + - d2/4 

TT 2r Tvr^ 

The second line follows because fr is an increasing function. Since 
the displacement d follows a Rayleigh((Td) distribution, given the 
user specified false positive rate a, we set 

a^P{d> r^'\Ho) = Qiv) = P{T > Mv')\Ho) 

=> = fr{Q~^{a)) 
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Fig. 3: The proposed coercive approach (red line) has much higher 
boundary overlapping rate than other approaches since it is able to 
detect missing boundaries and register misaligned boundaries. 



where (5(.) is the Rayleigh tail distribution. As a result, the threshold 
is adaptively determined by a and the equivalent radius r. 

4. EXPERIMENTS 

4.1. Boundary Detection Accuracy on Simulated Data 

In this section, we compare grain boundary detection performance 
on simulated EBSD and BSE images using three different ap¬ 
proaches: A. Segment the BSE and EBSD images separately by 
suitable existing segmentation algorithms fTol [Tsl : B. Segment 
EBSD and register the boundaries onto BSE using a B-Spline de¬ 
formation model and the mutual information criterion 013 ; C. 
The proposed coercive registration/segmentation algorithm with 
A = 0.15, e = 25, a = 0.05. 

The grain shapes in the testing data are taken from real mi¬ 
croscopy images downloaded from BlueQuartz CD and segmented 
by their Dream3D toolbox. For each slice, some of the grains are 
randomly selected and displaced to produce boundary misalignment 
according to the Gaussian displacement model with = 3 (pixels). 
This creates the ground truth boundaries for evaluation. The pixel 
values for BSE and EBSD are generated from Gaussian and VMF 
distributions with random mean and variance/concentration for each 
grain region. 

To evaluate the boundary detection accuracy, we use the “over¬ 
lapping rate”. Let B(w) be the set of boundary pixel locations with 
boundary width w, which is obtained by image dilation with fil¬ 
ter disk radius w/2. The overlapping rate is defined as 0(w) — 
jBT(w) n B(w)j/jBT('w) U B{'w)\, where Bt{w),B{w) are the 
ground truth and estimated boundary. 

Eigure|^ shows the overlapping rate of the three approaches for 
different boundary widths. Independent segmentation has the worst 
performance since it does not make use of shared sub-structure be¬ 
tween modalities. With B-spline registration, there is some improve¬ 
ment but it is still not satisfactory, especially for small w. The 


Fig. 4: The registered boundaries (blue lines) fit the BSE image val¬ 
ues much better than the initial boundaries from EBSD (red lines). 
The proposed approach is also able to detect and locate missing 
boundaries within grain regions (green lines). 

proposed coercive registration approach with hypothesis testing is 
able to accurately register misaligned boundaries and detect miss¬ 
ing edges. Therefore, it has much better boundary detection perfor¬ 
mance. 

4.2. Results on Real Microscopy Data 

We apply the proposed method to the IN 100 data set which contains 
170 slices of EBSD and BSE images of a Ni-base alloy. Eigure|^ 
shows one registration/segmentation result overlaid on the BSE im¬ 
age. The red lines are the initial boundaries obtained by the EBSD 
segmentation and affine-transformed to match BSE. The blue lines 
are the realigned boundaries and the green lines are the missing 
boundaries detected by the hypothesis tests. The initial red lines 
are misaligned with the BSE image values but are corrected by our 
registration algorithm. Using statistical hypothesis tests, we are also 
able to detect and locate missing boundaries in some grain regions. 
These results in real data demonstrate that the proposed approach 
can accurately register boundaries and segment grain regions at the 
same time. 

5. CONCLUSION 

In this work, we proposed a coercive registration/segmentation algo¬ 
rithm for multi-modal images. The algorithm alternately utilizes in¬ 
formation from one modality to help segment the image in the other 
modality, resulting in significant performance improvement in both 
modalities. The proposed hypothesis test based on statistical models 
of pixel values can accurately detect and locate missing boundaries 
between regions. Furthermore, our approach identifies and preserves 

















all of the correspondences between regions in different modalities, 
which is important for fusing information after registration. The ex¬ 
periment results on simulated and real microscopy images show that 
our approach is able to effectively correct misaligned grain bound¬ 
aries and detect missing boundaries within grain regions. 
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